cuda.core.system: Add basic Nvlink and Utilization support#1918
cuda.core.system: Add basic Nvlink and Utilization support#1918mdboom merged 7 commits intoNVIDIA:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
ac86822 to
039013e
Compare
rwgk
left a comment
There was a problem hiding this comment.
Generated with the help of Cursor GPT-5.4 Extra High Fast
Manually verified.
Medium: Invalid NVLink indices are accepted and fail late
Device.nvlink() currently accepts negative or out-of-range link indices and
returns NvlinkInfo without validating them first. That differs from existing
indexed accessors such as Device.fan(), which validate eagerly. In practice,
device.nvlink(-1) constructs successfully and only fails later when a
property such as .version is accessed, which turns a basic argument error
into a delayed runtime failure.
Relevant paths:
cuda_core/cuda/core/system/_device.pyx:585cuda_core/cuda/core/system/_device.pyx:683cuda_core/cuda/core/system/_nvlink.pxi
Low: NvlinkInfo.version documents a non-existent return type
The public enum exported by cuda.core.system is NvlinkVersion, and the API
index plus tests use that spelling, but NvlinkInfo.version is annotated and
documented as NvLinkVersion. That leaks a wrong type name into the generated
help/doc output and points users at a symbol that does not exist.
Relevant paths:
cuda_core/cuda/core/system/_nvlink.pxi:21cuda_core/docs/source/api.rst:225cuda_core/tests/system/test_system_device.py:747
Low: NvlinkInfo.state has no direct test coverage
The new test_nvlink() checks construction of NvlinkInfo and accesses
.version, but it never reads .state. As a result, the wrapper path behind
NvlinkInfo.state has no direct coverage even on systems where the test does
not skip.
Relevant paths:
cuda_core/cuda/core/system/_nvlink.pxi:35cuda_core/tests/system/test_system_device.py:734
|
Thanks for having your agent fight with my agent, @rwgk. ;) |
…pyterlab-nvdashboard
| nvml.device_get_nvlink_state(self._device._handle, self._link) == nvml.EnableState.FEATURE_ENABLED | ||
| ) | ||
|
|
||
| max_links = nvml.NVLINK_MAX_LINKS |
There was a problem hiding this comment.
I think class level variables already are:
>>> d.nvlink(0).max_links = 23
Traceback (most recent call last):
File "<python-input-5>", line 1, in <module>
d.nvlink(0).max_links = 23
^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'cuda.core.system._device.NvlinkInfo' object attribute 'max_links' is read-only
A cdef variable would not be available from Python.
This comment has been minimized.
This comment has been minimized.
1 similar comment
|
These APIs are needed by rapidsai/jupterlab-nvdashboard and rapidsai/rapids-cli